---
title: "Python对URL的处理"
date: 2018-11-21
categories:
- python
tags:
---

<div id="content">
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#org7a2e2b6">拆分URL</a></li>
<li><a href="#org530be0f">URLEncode</a>
<ul>
<li><a href="#orgecd7586">把字典组合成keyvalue pair字符串形式（form表单形式）</a></li>
<li><a href="#org91394bf">quote函数</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div class="outline-2" id="outline-container-org7a2e2b6">
<h2 id="org7a2e2b6">拆分URL</h2>
<div class="outline-text-2" id="text-org7a2e2b6">
<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">! python3</span>
<span style="font-weight: bold;">import</span> urllib.parse
<span style="font-weight: bold; font-style: italic;">url</span> = <span style="font-style: italic;">'http://www.bing.com:80/index.html'</span> <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">这是个完整的url</span>

<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">splittype 拆分出protocol</span>
<span style="font-weight: bold; font-style: italic;">t</span>,<span style="font-weight: bold; font-style: italic;">rest</span> = urllib.parse.splittype(url)
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">t = http</span>
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">rest = //www.bing.com:80/index.html</span>

<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">splithost 拆分出主机名</span>
<span style="font-weight: bold; font-style: italic;">host</span>, <span style="font-weight: bold; font-style: italic;">rest</span> = urllib.parse.splithost(rest)
<span style="font-weight: bold;">print</span>(host) <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">www.bing.com:80</span>
<span style="font-weight: bold;">print</span>(rest) <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">/index.html</span>

<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">splitport 拆分出端口号</span>
<span style="font-weight: bold; font-style: italic;">host</span>, <span style="font-weight: bold; font-style: italic;">port</span> = urllib.parse.splitport(host)
<span style="font-weight: bold;">print</span>(host) <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">www.bing.com</span>
<span style="font-weight: bold;">print</span>(port) <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">80</span>
</pre>
</div>
<p>
其他的测试：
</p>
<table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups">
<colgroup>
<col class="org-left"/>
<col class="org-left"/>
<col class="org-left"/>
<col class="org-left"/>
</colgroup>
<thead>
<tr>
<th class="org-left" scope="col"> </th>
<th class="org-left" scope="col">splittype</th>
<th class="org-left" scope="col">splithost</th>
<th class="org-left" scope="col">splitport</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">/home/index</td>
<td class="org-left">(None,/home/index)</td>
<td class="org-left">(None,/home/index)</td>
<td class="org-left">Exception</td>
</tr>
<tr>
<td class="org-left">../../html/index.html</td>
<td class="org-left">(None,../../html/index.html)</td>
<td class="org-left">(None, ../../html/index.html)</td>
<td class="org-left">Exception</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="outline-2" id="outline-container-org530be0f">
<h2 id="org530be0f">URLEncode</h2>
<div class="outline-text-2" id="text-org530be0f">
<blockquote>
<p>
<a href="https://stackoverflow.com/questions/5607551/how-to-urlencode-a-querystring-in-python">https://stackoverflow.com/questions/5607551/how-to-urlencode-a-querystring-in-python</a>
</p>
</blockquote>
</div>
<div class="outline-3" id="outline-container-orgecd7586">
<h3 id="orgecd7586">把字典组合成keyvalue pair字符串形式（form表单形式）</h3>
<div class="outline-text-3" id="text-orgecd7586">
<p>
其中的特殊字符将自动被编码
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">python2</span>
<span style="font-weight: bold;">import</span> urllib
<span style="font-weight: bold; font-style: italic;">f</span> = { <span style="font-style: italic;">'eventName'</span> : <span style="font-style: italic;">'myEvent'</span>, <span style="font-style: italic;">'eventDescription'</span> : <span style="font-style: italic;">'cool event'</span>}
urllib.urlencode(f)
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">result:</span>
<span style="font-style: italic;">'eventName=myEvent&amp;eventDescription=cool+event'</span>

<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">python3 </span>
urllib.parse.urlencode(f)
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">result:</span>
<span style="font-weight: bold; font-style: italic;">eventName</span>=myEvent&amp;<span style="font-weight: bold; font-style: italic;">eventDescription</span>=cool+event
</pre>
</div>
</div>
</div>
<div class="outline-3" id="outline-container-org91394bf">
<h3 id="org91394bf">quote函数</h3>
<div class="outline-text-3" id="text-org91394bf">
<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">与quote函数类似。plus表示将空格变成+号</span>
<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">python2</span>
&gt;&gt;&gt; urllib.quote_plus(<span style="font-style: italic;">'string_of_characters_like_these:$#@=?%^Q^$'</span>)
<span style="font-style: italic;">'string_of_characters_like_these%3A%24%23%40%3D%3F%25%5EQ%5E%24'</span>

<span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">python3</span>
<span style="font-weight: bold;">import</span> urllib.parse
urllib.parse.quote_plus(...)
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="status" id="postamble">
<p class="date">Date: 2018-11-21</p>
<p class="author">Author: gdme1320</p>
<p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p>
</div>
